CUIRRE: An open-source library for load balancing and characterizing irregular applications on GPUs

نویسندگان

  • Tao Zhang
  • Wei Shu
  • Min-You Wu
چکیده

While Graphics Processing Units (GPUs) show high performance for problems with regular structures, they do not perform well for irregular tasks due to the mismatches between irregular problem structures and SIMD-like GPU architectures. In this paper, we introduce a new library, CUIRRE, for improving performance of irregular applications on GPUs. CUIRRE reduces the load imbalance of GPU threads resulting from irregular loop structures. In addition, CUIRRE can characterize irregular applications for their irregularity, thread granularity and GPU utilization. We employ this library to characterize and optimize both synthetic and real-world applications. The experimental results show that a 1.63× on average and up to 2.76× performance improvement can be achieved with the centralized task pool approach in the library at a 4.57% average overhead with static loading ratios. To avoid the cost of exhaustive searches of loading ratios, an adaptive loading ratio method is proposed to derive appropriate loading ratios for different inputs automatically at runtime. Our task pool approach outperforms other load balancing schemes such as the task stealing method and the persistent threads method. The CUIRRE library can easily be applied on many other irregular problems. ∗Corresponding author: Tao Zhang, 3-121 SEIEE Building, Shanghai Jiao Tong University, 800 Dong Chuan Road, Min Hang District,Shanghai 200240, China; Phone:+8613918520935 Email addresses: [email protected] (Tao Zhang ), [email protected] (Wei Shu), [email protected] (Min-You Wu) Preprint submitted to Journal of Parallel and Distributed Computing May 13, 2014

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LB_Migrate: A Dynamic Load Balancing Library

The design of a general-purpose dynamic load balancing library for a vast variety of parallel applications is more challenging than the design of a static partitioning library. The dynamic load balancing library needs to be implemented in parallel with the application and must utilize memory efficiently, so that the application scalability is not affected. This paper studies the need for a dyna...

متن کامل

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

Efficient Dynamic Multiple GPGPU Layer for OpenCV

General purpose graphic processing unit (GPGPU) provides high performance resource for computing. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) permit writing of parallel computing programs that utilize multiple central processing units (CPU) and GPGPUs. The image processing library, OpenCV (Open Source Computer Vision library), may benefit greatly from paralle...

متن کامل

Dynamic Load Balancing Strategies for Graph Applications on GPUs

Acceleration of graph applications on GPUs has found large interest due to the ubiquitous use of graph processing in various domains. The inherent irregularity in graph applications leads to several challenges for parallelization. A key challenge, which we address in this paper, is that of loadimbalance. If the work-assignment to threads uses node-based graph partitioning, it can result in skew...

متن کامل

Virtual Data Space - A Universal Load Balancing Scheme

The Virtual Data Space is a standard C-library which automatically distributes the work-packets generated by parallel applications across the processing nodes. VDS is a universal system ooering loadbalancing-mechanisms for applications which incorporate independent load-items and scheduling algorithms for those which comprise precedence-constraints between their diierent tasks. This paper prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 74  شماره 

صفحات  -

تاریخ انتشار 2014